Simple RNN

In ths notebook, we're going to train a simple RNN to do time-series prediction. Given some set of input data, it should be able to generate a prediction for the next time step!

  • First, we'll create our data
  • Then, define an RNN in PyTorch
  • Finally, we'll train our network and see how it performs

Import resources and create data

The data generation process in the next cell is following these steps:

  1. Generate equally spaced values between 0 and pi "consider them as angels between 0 and 180 degrees".
  2. Apply the sine function over each of the generated values "so the output will always be between -1 and 1 which is the range of output of the sine function".
  3. The features will be the sine output of the number and the labels will be also the sine output but of the next number, so as if it's a one time step ahead in the future.

Define the RNN

Next, we define an RNN in PyTorch. We'll use nn.RNN to create an RNN layer, then we'll add a last, fully-connected layer to get the output size that we want. An RNN takes in a number of parameters:

Take a look at the RNN documentation to read more about recurrent layers.

Check the input and output dimensions

As a check that your model is working as expected, test out how it responds to input data.

Explaining the Tensor Dimensions for the RNN

Regarding the Inputs

Regarding the Training Process and the RNN Output

At each time step:

Regarding the Hidden State

Regarding the Final Output


Training the RNN

Next, we'll instantiate an RNN with some specified hyperparameters. Then train it over a series of steps, and see how it performs.

Loss and Optimization

This is a regression problem: can we train an RNN to accurately predict the next data point, given a current data point?

  • The data points are coordinate values, so to compare a predicted and ground_truth point, we'll use a regression loss: the mean squared error.
  • It's typical to use an Adam optimizer for recurrent models.

Defining the training function

This function takes in an rnn, a number of steps to train for, and returns a trained rnn. This function is also responsible for displaying the loss and the predictions, every so often.

Hidden State

Pay close attention to the hidden state, here:

After training the model for 150 steps we can see the model predicted values "points with the blue color" are of the same shape as the input sequences "points with the red color" but with just 1 time step difference. Now, it's time to test the model on a sequence of range = 360 degrees and never seen by the model before.

We can see that, the first few predictions in the sequence are bad compared to the rest of the sequence. The reason for this is when the model starts seeing the data, it has no memory "we passed an initial hidden state = None", and the model builds its memory more and more each time it sees a new data point of the sequence data points. To avoid this issue we use what we can call Priming, the aim of Priming is to build up the model's memory before testing on a new sequence. We will see more about Priming in the upcoming notebook about LSTM cells.

We can clearly see that, after building up the model's memory, the predictions are much better for the early time steps. However, building the model's memory depends on the length of the Prime Sequence and the distance between the points of the sequence. In my case here, I set the distance between the points of the Prime Sequence just like the distance between the points of the Test Sequence.